时序数据取样方法_数据科学的抽样方法

本文探讨了时序数据取样的重要性,介绍了如何在数据科学项目中有效地进行取样,涵盖了从原始时序数据中抽取样本的关键方法。无论是使用Java还是Python,了解这些方法对于优化人工智能模型的性能至关重要。
摘要由CSDN通过智能技术生成

时序数据取样方法

语境 (Context)

In most studies, it is pretty hard (or sometimes impossible) to analyse a whole population, so researchers use samples instead. In statistics, survey sampling is the process by which we get a sample from our population, in order to conduct a survey. As data scientists, we usually use data that was previously collected, so we don’t spend too much time thinking about how to actually do this. As we will see in this article, however, our data can have different biases, depending on how it was sampled, so you better understand the implications of each of this sampling designs. There are many ways of drawing those samples and, depending on the context, some can be better than others.

在大多数研究中,很难分析整个人口 (有时甚至是不可能),因此研究人员使用样本代替。 在统计中,调查抽样是我们从人口中获取样本以进行调查的过程。 作为数据科学家,我们通常使用以前收集的数据,因此我们不会花太多时间思考如何实际执行此操作。 但是,正如我们将在本文中看到的那样,我们的数据可能会有不同的偏差,具体取决于如何采样,因此您可以更好地理解每种采样设计的含义。 绘制这些样本的方法有很多,根据上下文的不同,有些方法可能更好。

概率x非概率 (Probability x non-probability)

There are two broad categories of sampling designs: probability and non-probability. In probability sampling, each element of the population has a known and non-zero probability of being in the sample. This method is usually preferable, since its properties, such as bias and sampling error, are usually known. In non-probability sampling, some elements of the population may not be selected and there is a great risk of the sample being non-representative of the population as a whole. However, probability sampling can sometimes not be possible under some circumstances, or it can just be cheaper to do it non-randomly.

抽样设计分为两大类:概率和非概率。 在概率抽样中 ,总体中的每个元素都有一个已知非零的 概率出现在样本中。 通常首选此方法,因为它的属性(例如偏差采样误差 )通常是已知的。 在非概率抽样中 ,可能不会选择总体的某些元素,并且存在很大的风险,即抽样不能代表整个总体。 但是,在某些情况下有时不可能进行概率采样,或者非随机地进行概率采样会更便宜。

Let’s now take a look at some of the different sampling designs in each category and their pr

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值